Nonlinear growth models represent an instance of nonlinear regression models, a class of models taking the general form \[ y = \mu(x, \theta) + \epsilon, \] where \(\mu(x, \theta)\) is the mean function which depends on a possibly vector-valued parameter \(\theta\), and a possibly vector-valued predictor \(x\). The stochastic component \(\epsilon\) represents the error with mean zero and constant variance. Usually, a Gaussian distribution is also assumed for the error term.
By defining the mean function \(\mu(x, \theta)\) we may obtain several different models, all characterized by the fact that parameters \(\theta\) enter in a nonlinear way into the equation. Parameters are usually estimated by nonlinear least squares which aims at minimizing the residual sum of squares.
\[ \mu(x) = \theta_1 \exp\{\theta_2 x\} \] where \(\theta_1\) is the value at the origin (i.e. \(\mu(x=0)\)), and \(\theta_2\) represents the (constant) relative ratio of change (i.e. \(\frac{d\mu(x)}{dx }\frac{1}{\mu(x)} = \theta_2\)). Thus, the model describes an increasing (exponential growth if \(\theta_2 > 0\)) or decreasing (exponential decay if \(\theta_2 < 0\)) trend with constant relative rate.
\[ \mu(x) = \frac{\theta_1}{1+\exp\{(\theta_2 - x)/\theta_3\}} \] where \(\theta_1\) is the upper horizontal asymptote, \(\theta_2\) represents the x-value at the inflection point of the symmetric growth curve, and \(\theta_3\) represents a scale parameter (and \(1/\theta_3\) is the growth-rate parameter that controls how quickly the curve approaches the upper asymptote).
\[ \mu(x) = \theta_1 \exp\{-\theta_2 \theta_3^x\} \] where \(\theta_1\) is the horizontal asymptote, \(\theta_2\) represents the value of the function at \(x = 0\) (displacement along the x-axis), and \(\theta_3\) represents a scale parameter.
The difference between the logistic and Gompertz functions is that the latter is not symmetric around the inflection point.
\[ \mu(x) = \theta_1 (1 - \exp\{-\theta_2 x\})^{\theta_3} \] where \(\theta_1\) is the horizontal asymptote, \(\theta_2\) represents the rate of growth, and \(\theta_3\) in part determines the point of inflection on the y-axis.
Dipartimento della Protezione Civile: COVID-19 Italia - Monitoraggio della situazione http://arcg.is/C1unv
Source: https://github.com/pcm-dpc/COVID-19
url = "https://raw.githubusercontent.com/pcm-dpc/COVID-19/master/dati-andamento-nazionale/dpc-covid19-ita-andamento-nazionale.csv"
COVID19IT <- read.csv(file = url, stringsAsFactors = FALSE)
COVID19IT$data <- as.Date(COVID19IT$data)
DT::datatable(COVID19IT)Warnings
- 29/03/2020: dati Regione Emilia Romagna parziali (dato tampone non aggiornato).
- 26/03/2020: dati Regione Piemonte parziali (-50 deceduti - comunicazione tardiva)
- 18/03/2020: dati Regione Campania non pervenuti.
- 18/03/2020: dati Provincia di Parma non pervenuti.
- 17/03/2020: dati Provincia di Rimini non aggiornati
- 16/03/2020: dati P.A. Trento e Puglia non pervenuti.
- 11/03/2020: dati Regione Abruzzo non pervenuti.
- 10/03/2020: dati Regione Lombardia parziali.
- 07/03/2020: dati Brescia +300 esiti positivi
# create data for analysis
data = data.frame(date = COVID19IT$data,
y = COVID19IT$totale_casi)
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
##
## Formula: y ~ exponential(x, th1, th2)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 2680.329919 325.646217 8.231 0.00000000167 ***
## th2 0.105602 0.003904 27.047 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4487 on 33 degrees of freedom
##
## Number of iterations to convergence: 8
## Achieved convergence tolerance: 0.000002995mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
##
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 128479.35449 2201.89162 58.35 <2e-16 ***
## xmid 29.02328 0.21309 136.20 <2e-16 ***
## scal 5.40403 0.07979 67.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 690.4 on 32 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.0000003174mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
# start = list(Asym = coef(mod2)[1])
# tmp = list(y = log(log(start$Asym) - log(data$y)), x = data$x)
# b = unname(coef(lm(y ~ x, data = tmp)))
# start = c(start, c(b2 = exp(b[1]), b3 = exp(b[2])))
# mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data, start = start,
# control = nls.control(maxiter = 1000))
summary(mod3)
##
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 266112.398238 15115.533676 17.61 <2e-16 ***
## b2 8.743490 0.204128 42.83 <2e-16 ***
## b3 0.939722 0.002027 463.59 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 742.3 on 32 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.000001482richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2)
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss,
y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
# trace = TRUE, algorithm = "plinear",
control = nls.control(maxiter = 1000, tol = 0.1))
# algorithm is not converging...
summary(mod4)
##
## Formula: y ~ richards(x, th1, th2, th3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 437893.270791 77577.438216 5.645 3.04e-06 ***
## th2 0.036909 0.004485 8.230 2.11e-09 ***
## th3 4.626867 0.323489 14.303 1.88e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 978.5 on 32 degrees of freedom
##
## Number of iterations to convergence: 12
## Achieved convergence tolerance: 0.001436
# library(nlmrt)
# mod4 = nlxb(y ~ th1*(1 - exp(-th2*x))^th3,
# data = data, start = start, trace = TRUE)models = list("Exponential model" = mod1,
"Logistic model" = mod2,
"Gompertz model" = mod3,
"Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
df = sapply(models, function(m) attr(logLik(m), "df")),
Rsquare = sapply(models, function(m)
cor(data$y, fitted(m))^2),
AIC = sapply(models, AIC),
AICc = sapply(models, AICc),
BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)| loglik | df | Rsquare | AIC | AICc | BIC | ||
|---|---|---|---|---|---|---|---|
| Exponential model | -342.9479 | 3 | 0.9834640 | 691.8957 | 692.6699 | 696.5618 | |
| Logistic model | -276.9000 | 4 | 0.9995898 | 561.8000 | 563.1334 | 568.0214 | *** |
| Gompertz model | -279.4376 | 4 | 0.9995040 | 566.8751 | 568.2085 | 573.0965 | |
| Richards model | -289.1052 | 4 | 0.9992038 | 586.2104 | 587.5437 | 592.4318 |
ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(aes(y = fitted(mod1), color = "Exponential")) +
geom_line(aes(y = fitted(mod2), color = "Logistic")) +
geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
geom_line(aes(y = fitted(mod4), color = "Richards")) +
labs(x = "", y = "Infected", color = "Model") +
scale_color_manual(values = cols) +
scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 5000),
minor_breaks = seq(0, coef(mod2)[1], by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top")last_plot() +
scale_y_continuous(trans = "log10", limits = c(100,NA)) +
labs(y = "Infected (log10 scale)")df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
fit1 = predict(mod1, newdata = df),
fit2 = predict(mod2, newdata = df),
fit3 = predict(mod3, newdata = df),
fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,c("fit2", "fit3")]))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
coord_cartesian(ylim = ylim) +
labs(x = "", y = "Infected", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 10000),
minor_breaks = seq(0, max(ylim), by = 5000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))
pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]
pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]
pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]
pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]
# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
subset(pred2, x == max(data$x)+1, select = 2:5),
subset(pred3, x == max(data$x)+1, select = 2:5),
subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
## date fit lwr upr
## 36 2020-03-30 120015 106936 133717
## 361 2020-03-30 100769 98866 102431
## 362 2020-03-30 104729 102509 107193
## 363 2020-03-30 105489 102856 108720
ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
coord_cartesian(ylim = c(0, max(ylim))) +
labs(x = "", y = "Infected", color = "Model") +
scale_y_continuous(minor_breaks = seq(0, max(ylim), by = 10000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# create data for analysis
data = data.frame(date = COVID19IT$data,
y = COVID19IT$deceduti)
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
##
## Formula: y ~ exponential(x, th1, th2)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 139.764446 18.139220 7.705 0.00000000713 ***
## th2 0.126482 0.004085 30.965 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 396.1 on 33 degrees of freedom
##
## Number of iterations to convergence: 9
## Achieved convergence tolerance: 0.000005423mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
##
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 15494.18940 447.89002 34.59 <2e-16 ***
## xmid 31.14506 0.30791 101.15 <2e-16 ***
## scal 4.91158 0.09972 49.25 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 93.85 on 32 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.000007761mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
# manually set starting values
# start = list(Asym = coef(mod2)[1])
# tmp = list(y = log(log(start$Asym) - log(data$y)), x = data$x)
# b = unname(coef(lm(y ~ x, data = tmp)))
# start = c(start, c(b2 = exp(b[1]), b3 = exp(b[2])))
# mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data, start = start,
# control = nls.control(maxiter = 10000))
summary(mod3)
##
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 40600.985410 2622.114322 15.48 <2e-16 ***
## b2 11.232608 0.262584 42.78 <2e-16 ***
## b3 0.940692 0.001859 506.15 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 61.05 on 32 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.0000002063richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2)
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss,
y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
# trace = TRUE, algorithm = "port",
control = nls.control(maxiter = 1000))
summary(mod4)
##
## Formula: y ~ richards(x, th1, th2, th3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 70419.320398 11426.045399 6.163 0.00000067759934 ***
## th2 0.038245 0.003472 11.016 0.00000000000202 ***
## th3 6.147262 0.353064 17.411 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 70 on 32 degrees of freedom
##
## Number of iterations to convergence: 9
## Achieved convergence tolerance: 0.000001704models = list("Exponential model" = mod1,
"Logistic model" = mod2,
"Gompertz model" = mod3,
"Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
df = sapply(models, function(m) attr(logLik(m), "df")),
Rsquare = sapply(models, function(m)
cor(data$y, fitted(m))^2),
AIC = sapply(models, AIC),
AICc = sapply(models, AICc),
BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)| loglik | df | Rsquare | AIC | AICc | BIC | ||
|---|---|---|---|---|---|---|---|
| Exponential model | -257.9888 | 3 | 0.9889961 | 521.9777 | 522.7519 | 526.6437 | |
| Logistic model | -207.0538 | 4 | 0.9993500 | 422.1076 | 423.4409 | 428.3290 | |
| Gompertz model | -192.0017 | 4 | 0.9996832 | 392.0035 | 393.3368 | 398.2248 | *** |
| Richards model | -196.7940 | 4 | 0.9996005 | 401.5879 | 402.9212 | 407.8093 |
ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(aes(y = fitted(mod1), color = "Exponential")) +
geom_line(aes(y = fitted(mod2), color = "Logistic")) +
geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
geom_line(aes(y = fitted(mod4), color = "Richards")) +
labs(x = "", y = "Deceased", color = "Model") +
scale_color_manual(values = cols) +
scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 500),
minor_breaks = seq(0, coef(mod2)[1], by = 100)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top")last_plot() +
scale_y_continuous(trans = "log10", limits = c(10,NA)) +
labs(y = "Deceased (log10 scale)")df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
fit1 = predict(mod1, newdata = df),
fit2 = predict(mod2, newdata = df),
fit3 = predict(mod3, newdata = df),
fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,-(1:3)]))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
coord_cartesian(ylim = ylim) +
labs(x = "", y = "Deceased", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 1000),
minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))
pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]
pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]
pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]
pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]
# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
subset(pred2, x == max(data$x)+1, select = 2:5),
subset(pred3, x == max(data$x)+1, select = 2:5),
subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
## date fit lwr upr
## 36 2020-03-30 13271 12167 14594
## 361 2020-03-30 11292 10961 11490
## 362 2020-03-30 11710 11537 11905
## 363 2020-03-30 11781 11557 12003
ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
coord_cartesian(ylim = c(0, max(ylim))) +
labs(x = "", y = "Deceased", color = "Model") +
scale_y_continuous(minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# create data for analysis
data = data.frame(date = COVID19IT$data,
y = COVID19IT$dimessi_guariti)
data$x = as.numeric(data$date) - min(as.numeric(data$date)) + 1
DT::datatable(data, options = list("pageLength" = 5))mod1_start = lm(log(y) ~ x, data = data)
b = unname(coef(mod1_start))
start = list(th1 = exp(b[1]), th2 = b[2])
exponential <- function(x, th1, th2) th1 * exp(th2 * x)
mod1 = nls(y ~ exponential(x, th1, th2), data = data, start = start)
summary(mod1)
##
## Formula: y ~ exponential(x, th1, th2)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 198.484951 24.913255 7.967 0.00000000344 ***
## th2 0.122095 0.003966 30.789 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 492.1 on 33 degrees of freedom
##
## Number of iterations to convergence: 10
## Achieved convergence tolerance: 0.000001994mod2 = nls(y ~ SSlogis(x, Asym, xmid, scal), data = data)
summary(mod2)
##
## Formula: y ~ SSlogis(x, Asym, xmid, scal)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 18815.1955 637.5913 29.51 <2e-16 ***
## xmid 30.9753 0.3722 83.23 <2e-16 ***
## scal 5.0557 0.1211 41.76 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 136.9 on 32 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.0000005189mod3 = nls(y ~ SSgompertz(x, Asym, b2, b3), data = data)
summary(mod3)
##
## Formula: y ~ SSgompertz(x, Asym, b2, b3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## Asym 51188.880747 6983.815791 7.33 0.0000000247 ***
## b2 10.415547 0.442357 23.55 < 2e-16 ***
## b3 0.943370 0.003687 255.85 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 154.9 on 32 degrees of freedom
##
## Number of iterations to convergence: 0
## Achieved convergence tolerance: 0.000005813richards <- function(x, th1, th2, th3) th1*(1 - exp(-th2*x))^th3
Loss <- function(th, y, x) sum((y - richards(x, th[1], th[2], th[3]))^2)
start <- optim(par = c(coef(mod2)[1], 0.001, 1), fn = Loss,
y = data$y, x = data$x)$par
names(start) <- c("th1", "th2", "th3")
mod4 = nls(y ~ richards(x, th1, th2, th3), data = data, start = start,
# trace = TRUE, # algorithm = "port",
control = nls.control(maxiter = 1000))
summary(mod4)
##
## Formula: y ~ richards(x, th1, th2, th3)
##
## Parameters:
## Estimate Std. Error t value Pr(>|t|)
## th1 95597.629963 34564.413387 2.766 0.00935 **
## th2 0.034316 0.006896 4.976 0.000021286133 ***
## th3 5.519867 0.615124 8.974 0.000000000299 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 172.6 on 32 degrees of freedom
##
## Number of iterations to convergence: 18
## Achieved convergence tolerance: 0.000007685models = list("Exponential model" = mod1,
"Logistic model" = mod2,
"Gompertz model" = mod3,
"Richards model" = mod4)
tab = data.frame(loglik = sapply(models, logLik),
df = sapply(models, function(m) attr(logLik(m), "df")),
Rsquare = sapply(models, function(m)
cor(data$y, fitted(m))^2),
AIC = sapply(models, AIC),
AICc = sapply(models, AICc),
BIC = sapply(models, BIC))
sel <- apply(tab[,4:6], 2, which.min)
tab$"" <- sapply(tabulate(sel, nbins = length(models))+1, symnum,
cutpoints = 0:4, symbols = c("", "*", "**", "***"))
knitr::kable(tab)| loglik | df | Rsquare | AIC | AICc | BIC | ||
|---|---|---|---|---|---|---|---|
| Exponential model | -265.5866 | 3 | 0.9881606 | 537.1731 | 537.9473 | 541.8392 | |
| Logistic model | -220.2714 | 4 | 0.9989385 | 448.5427 | 449.8761 | 454.7641 | *** |
| Gompertz model | -224.5872 | 4 | 0.9986657 | 457.1744 | 458.5077 | 463.3958 | |
| Richards model | -228.3827 | 4 | 0.9984053 | 464.7653 | 466.0987 | 470.9867 |
ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(aes(y = fitted(mod1), color = "Exponential")) +
geom_line(aes(y = fitted(mod2), color = "Logistic")) +
geom_line(aes(y = fitted(mod3), color = "Gompertz")) +
geom_line(aes(y = fitted(mod4), color = "Richards")) +
labs(x = "", y = "Recovered", color = "Model") +
scale_color_manual(values = cols) +
scale_y_continuous(breaks = seq(0, coef(mod2)[1], by = 500),
minor_breaks = seq(0, coef(mod2)[1], by = 100)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top")last_plot() +
scale_y_continuous(trans = "log10", limits = c(10,NA)) +
labs(y = "Recovered (log10 scale)")df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1),
fit1 = predict(mod1, newdata = df),
fit2 = predict(mod2, newdata = df),
fit3 = predict(mod3, newdata = df),
fit4 = predict(mod4, newdata = df))
ylim = c(0, max(df[,-(1:3)]))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = df, aes(x = date, y = fit1, color = "Exponential")) +
geom_line(data = df, aes(x = date, y = fit2, color = "Logistic")) +
geom_line(data = df, aes(x = date, y = fit3, color = "Gompertz")) +
geom_line(data = df, aes(x = date, y = fit4, color = "Richards")) +
coord_cartesian(ylim = ylim) +
labs(x = "", y = "Recovered", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 1000),
minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))# compute prediction using Moving Block Bootstrap (MBB) for nls
df = data.frame(x = seq(min(data$x), max(data$x)+14))
df = cbind(df, date = as.Date(df$x, origin = data$date[1]-1))
pred1 = cbind(df, "fit" = predict(mod1, newdata = df))
pred1[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod1, df[df$x > max(data$x),])[,2:3]
pred2 = cbind(df, "fit" = predict(mod2, newdata = df))
pred2[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod2, df[df$x > max(data$x),])[,2:3]
pred3 = cbind(df, "fit" = predict(mod3, newdata = df))
pred3[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod3, df[df$x > max(data$x),])[,2:3]
pred4 = cbind(df, "fit" = predict(mod4, newdata = df))
pred4[df$x > max(data$x), c("lwr", "upr")] = predictMBB.nls(mod4, df[df$x > max(data$x),])[,2:3]
# predictions for next day
pred = rbind(subset(pred1, x == max(data$x)+1, select = 2:5),
subset(pred2, x == max(data$x)+1, select = 2:5),
subset(pred3, x == max(data$x)+1, select = 2:5),
subset(pred4, x == max(data$x)+1, select = 2:5))
print(pred, digits = 3)
## date fit lwr upr
## 36 2020-03-30 16093 14834 17780
## 361 2020-03-30 13732 13403 14052
## 362 2020-03-30 14273 13860 14717
## 363 2020-03-30 14354 13884 14955
ylim = c(0, max(pred2$upr, pred3$upr, na.rm=TRUE))ggplot(data, aes(x = date, y = y)) +
geom_point() +
geom_line(data = pred1, aes(x = date, y = fit, color = "Exponential")) +
geom_line(data = pred2, aes(x = date, y = fit, color = "Logistic")) +
geom_line(data = pred3, aes(x = date, y = fit, color = "Gompertz")) +
geom_line(data = pred4, aes(x = date, y = fit, color = "Richards")) +
geom_ribbon(data = pred1, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[1], alpha=0.3) +
geom_ribbon(data = pred2, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[2], alpha=0.3) +
geom_ribbon(data = pred3, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[3], alpha=0.3) +
geom_ribbon(data = pred4, aes(x = date, ymin = lwr, ymax = upr),
inherit.aes = FALSE, fill = cols[4], alpha=0.3) +
coord_cartesian(ylim = c(0, max(ylim))) +
labs(x = "", y = "Recovered", color = "Model") +
scale_y_continuous(breaks = seq(0, max(ylim), by = 5000),
minor_breaks = seq(0, max(ylim), by = 1000)) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = cols) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))df = data.frame(date = COVID19IT$data,
swabs = c(NA, diff(COVID19IT$tamponi)),
positives = COVID19IT$nuovi_attualmente_positivi)
df$x = as.numeric(df$date) - min(as.numeric(df$date)) + 1
df$r = df$positives/df$swabs
df = subset(df, swabs > 50)
graph1 <- ggplot(df, aes(x = date)) +
geom_point(aes(y = swabs, color = "swabs"), pch = 19) +
geom_line(aes(y = swabs, color = "swabs")) +
geom_point(aes(y = positives, color = "positives"), pch = 15) +
geom_line(aes(y = positives, color = "positives")) +
labs(x = "", y = "Number of cases", color = "") +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
scale_color_manual(values = palette()[c(2,1)]) +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))
graph2 <- ggplot(df, aes(x = date, y = r)) +
geom_smooth(method = "loess", se = TRUE, col = "darkgrey") +
geom_point(col=palette()[4]) +
geom_line(size = 0.5, col=palette()[4]) +
labs(x = "", y = "New positives / swabs") +
scale_y_continuous(labels = scales::percent_format()) +
scale_x_date(date_breaks = "2 day", date_labels = "%b%d",
minor_breaks = "1 day") +
theme_bw() +
theme(legend.position = "top",
axis.text.x = element_text(angle=60, hjust=1))
grid.arrange(graph1, graph2, nrow = 2, ncol = 1, widths = 1, heights = c(0.6,0.4))